1 Various Approaches in Text Pre - processing
نویسنده
چکیده
Text mining, as an increasingly important field of research in Knowledge Discovery in Data (KDD), concentrates on discovering hidden patterns, rules, regularities and trends from textual data, such as natural language speech or web documents. The structure of textual data is considered implicit, which is different from the structured data that stored in databases. The various natures of textual data and the data in databases cause the major difference between text mining and the traditional data mining (the main stage of KDD). However, it is doable to obtain the knowledge from texts by employing the techniques in data mining if we can rationally explicate the textual data to be structured or semi-structured. In this paper, we summarize the various ways of text preprocessing. Our work is presented with the aim of supporting future work in text mining research.
منابع مشابه
Novel Approaches to Pre-processing Documentbase in Text Classification
Text classification is currently popular in Knowledge Discovery in Databases (KDD) and Machine Learning (ML). KDD based text classification research focuses on statistical techniques, while the ML based approach focuses on artificial-intelligence techniques. Text mining necessitates the pre-processing of the documentbase. Two broad approaches can be identified: (1) document representation and (...
متن کاملCSCR010: Second Year Report
The aim of my PhD research is focused on Text Mining, one major research school in Knowledge Discovery in Databases (KDD), and in particular Text Preprocessing (TPP) for classification / categorization of documents utilizing novel algorithms for the identification of hidden patterns, rules, regularities and trends within these documents. Significant techniques in Data Mining, another wellknown ...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملText Analytics of Customers on Twitter: Brand Sentiments in Customer Support
Brand community interactions and online customer support have become major platforms of brand sentiment strengthening and loyalty creation. Rapid brand responses to each customer request though inbound tweets in twitter and taking proper actions to cover the needs of customers are the key elements of positive brand sentiment creation and product or service initiative management in the realm of ...
متن کامل